A Novel Algorithm for Association Rule Mining from Data with Incomplete and Missing Values
نویسنده
چکیده
Missing values and incomplete data are a natural phenomenon in real datasets. If the association rules mine incomplete disregard of missing values, mistaken rules are derived. In association rule mining, treatments of missing values and incomplete data are important. This paper proposes novel technique to mine association rule from data with missing values from large voluminous databases. The proposed technique is decomposed into two sub problems: database scrutinizes and rules mining phases. The first phase is used to reexamine transactions which are useful to mine frequent itemset. The second phase is to mine frequent itemset amd construct association rules from valid database. This paper uses Apriori based algorithm in which proposed technique. The proposed technique is tested with synthetic and real T40I10D100K, Mushroom, Chess and Heart disease prediction datasets. Experimental results are shown that the proposed technique outperforms than robust association rule mining (RAR) and Association rules from data with Missing values by Database Partitioning and Merging (AMDPM) algorithm.
منابع مشابه
A Novel Method for Selecting the Supplier Based on Association Rule Mining
One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملApplying Ordinal Association Rules for Cleansing Data With Missing Values
Cleansing data of errors is an important processing step particularly when integrating heterogeneous data sources. Dirty data files are prevalent in data warehouses because of incorrect or missing data values, inconsistent attribute naming conventions or incomplete information. This paper improves the data cleansing ordinal association rules technique by proposing a solution for the missing val...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کامل